Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error checking in rt.sh #2388

Closed

Conversation

DusanJovic-NOAA
Copy link
Collaborator

@DusanJovic-NOAA DusanJovic-NOAA commented Aug 5, 2024

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

Update rt scripts to fix checking the job exit status from Slurm and PBS schedulers. Fix checking the return code from check_results function, code from check_results is now inlined in run_test.sh. Add new test in error-test.conf to check for job wall clock timeout error. See #2379 for details.

Commit Message:

* UFSWM - Update rt scripts to fix checking the job exit status from Slurm and PBS schedulers.

Priority:

  • Normal

Git Tracking

UFSWM:

Sub component Pull Requests:

  • None

Changes

Regression Test Changes (Please commit test_changes.list):

  • No Baseline Changes.

Input data Changes:

  • None.

Library Changes/Upgrades:

  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • Gaea
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

@DusanJovic-NOAA
Copy link
Collaborator Author

Regression test passed on Hera. RegressionTests_hera.log

@DusanJovic-NOAA DusanJovic-NOAA added the No Baseline Change No Baseline Change label Aug 5, 2024
@BrianCurtis-NOAA
Copy link
Collaborator

I tested this on WCOSS2 with #2183 and the full suite passed. Since there were no errors, I can't verify how it performs if there was one. At least, though, it doesn't cause issue with a no-error RT suite.

@jkbk2004
Copy link
Collaborator

merged with #2389

@jkbk2004 jkbk2004 closed this Aug 27, 2024
@DusanJovic-NOAA DusanJovic-NOAA deleted the rt_squeue_state branch August 30, 2024 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
No Baseline Change No Baseline Change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect job status returned from squeue in rt.sh
3 participants